Metadata

Close
Metadata

%0 Conference Proceedings
%4 sid.inpe.br/sibgrapi/2021/09.05.17.11
%2 sid.inpe.br/sibgrapi/2021/09.05.17.11.38
%@doi 10.1109/SIBGRAPI54419.2021.00016
%T Gaze estimation via self-attention augmented convolutions
%D 2021
%A Vieira, Gabriel Lefundes,
%A Oliveira, Luciano,
%@affiliation Federal University of Bahia
%@affiliation Federal University of Bahia
%E Paiva, Afonso ,
%E Menotti, David ,
%E Baranoski, Gladimir V. G. ,
%E Proença, Hugo Pedro ,
%E Junior, Antonio Lopes Apolinario ,
%E Papa, João Paulo ,
%E Pagliosa, Paulo ,
%E dos Santos, Thiago Oliveira ,
%E e Sá, Asla Medeiros ,
%E da Silveira, Thiago Lopes Trugillo ,
%E Brazil, Emilio Vital ,
%E Ponti, Moacir A. ,
%E Fernandes, Leandro A. F. ,
%E Avila, Sandra,
%B Conference on Graphics, Patterns and Images, 34 (SIBGRAPI)
%C Gramado, RS, Brazil (virtual)
%8 18-22 Oct. 2021
%I IEEE Computer Society
%J Los Alamitos
%S Proceedings
%K deep learning, gaze estimation, attention-augmented convolutions.
%X Although recently deep learning methods have boosted the accuracy of appearance-based gaze estimation, there is still room for improvement in the network architectures for this particular task. Hence we propose here a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features during the training of a shallower residual network. The rationale is that self-attention mechanism can help outperform deeper architectures by learning dependencies between distant regions in full-face images. This mechanism can also create better and more spatially-aware feature representations derived from the face and eye images before gaze regression. We dubbed our framework ARes-gaze, which explores our Attention-augmented ResNet (ARes-14) as twin convolutional backbones. In our experiments, results showed a decrease of the average angular error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze data set, while achieving a second-place on the EyeDiap data set. It is noteworthy that our proposed framework was the only one to reach high accuracy simultaneously on both data sets.
%@language en
%3 gaze_attention_sibgrapi_2021_CAMERA_READY(1).pdf